In the data set, each observation is characterized by 3 quantitative variables describing the features of the apartments sold.
dane = dane %>% select(town, flat_model, flat_type)
dane$town = as.factor(dane$town)
dane$flat_model= as.factor(dane$flat_model)
dane$flat_type = as.factor(dane$flat_type)
dane= na.omit(dane)
dane = dane %>% mutate(id=row_number(dane$town))
dane$id=as.factor(dane$id)
kable(dane[1:10,], align = "cc", caption = "Table 1. The first 10 rows of data.")
| town | flat_model | flat_type | id |
|---|---|---|---|
| ANG MO KIO | New Generation | 3 ROOM | 1 |
| ANG MO KIO | New Generation | 3 ROOM | 2 |
| ANG MO KIO | New Generation | 4 ROOM | 3 |
| BEDOK | New Generation | 3 ROOM | 128 |
| BEDOK | New Generation | 3 ROOM | 129 |
| BEDOK | New Generation | 4 ROOM | 130 |
| BUKIT MERAH | Standard | 2 ROOM | 323 |
| BUKIT PANJANG | Premium Apartment | 4 ROOM | 365 |
| CENTRAL AREA | Standard | 3 ROOM | 404 |
| CLEMENTI | New Generation | 3 ROOM | 429 |
The purpose of basket market analysis is to determine which combinations of products or services are most often purchased by customers. It is based on association rules. It is based on association rules that create a scheme by which it can be assumed with a certain probability that if A has occurred, then B will also occur. They make it possible to predict the simultaneous occurrence of two interdependent phenomena and behaviors.
The first stage is to build a basket consisting of data describing the apartments sold and their features.
invisible({capture.output({
as_trans<-dane[c("town","flat_model","flat_type")]
as_trans$town<- as.factor(as_trans$town)
as_trans$flat_model<- as.factor(as_trans$flat_model)
as_trans$flat_type<- as.factor(as_trans$flat_type)
split(as_trans[,1], as_trans[,2], as_trans[,3])
basket <- as(as_trans, "transactions")
})})
Now we can display the basic information of our basket.
basket
## transactions in sparse format with
## 1224 transactions (rows) and
## 32 items (columns)
Number of items: 3672
Number of baskets: 1224
The biggest basket consists of 32 products.
The chart below presents the 10 most common transactions in the housing market data.
itemFrequencyPlot(basket, topN=10, type="relative", cex.names=0.8)
Figure 1. Items Frequency.
We can see that in the residential market, the greatest number of properties sold belong to the “New Generation” model and four-room apartments.
Next step of the analysis is an induction of the rules from determined itemsets. We set the supp parameter on the 0.05 value.
rules<-apriori(basket, parameter = list(supp=.05))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 61
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[32 item(s), 1224 transaction(s)] done [0.00s].
## sorting and recoding items ... [14 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [7 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
redundant_rules<-is.redundant(rules)
#summary(redundant_rules)
Table 2. Inspection of the rules.
inspect(rules)
## lhs rhs support confidence coverage lift count
## [1] {town=SENGKANG} => {flat_model=Premium Apartment} 0.08660131 1.0000000 0.08660131 2.928230 106
## [2] {town=PUNGGOL} => {flat_model=Premium Apartment} 0.09558824 1.0000000 0.09558824 2.928230 117
## [3] {town=ANG MO KIO} => {flat_model=New Generation} 0.10212418 0.9842520 0.10375817 2.206455 125
## [4] {town=SENGKANG,
## flat_type=4 ROOM} => {flat_model=Premium Apartment} 0.05228758 1.0000000 0.05228758 2.928230 64
## [5] {town=PUNGGOL,
## flat_type=4 ROOM} => {flat_model=Premium Apartment} 0.05882353 1.0000000 0.05882353 2.928230 72
## [6] {town=BEDOK,
## flat_type=3 ROOM} => {flat_model=New Generation} 0.05555556 0.9577465 0.05800654 2.147036 68
## [7] {town=ANG MO KIO,
## flat_type=3 ROOM} => {flat_model=New Generation} 0.07761438 1.0000000 0.07761438 2.241758 95
We can see that for the value of 0.05 of the parameter supp we get 7 rules.
On the table above we can see the most frequent itemsets. Considering that the value of the support may range from 0 to 1, we can conclude that the basket does not have a one clear purchasing pattern. The most popular model is the purchase of an apartment in Ang Mo Kio with a New Generation standard. The first two columns refer to a set of items which say that if x is purchased then y will be bought. Lift indicates the probability of a purchasing pattern compared to a situation where items are interdependent. Around 10% of all transactions contain flats in New Generation in Ang Mo Kio town. According to the value of confidence, the probability of buying an apartment in New Generation standard in a transaction in Ang Mo Kio is 0.98. Moreover, the total occurrence of these items is 2.21 times greater than we expected, assuming the independence of both housing characteristics.
Our results we can also presented graphically. Below we can see 7 obtained rules.The arrows shows the direction of the basket rules, the size of circles the support rate and the color the lift value.
topRules<-rules#[1:10]
plot(topRules, method="graph", engine = "htmlwidget")
Figure 2. A graph showing the division into rules.
arulesViz::plotly_arules(topRules, method="matrix", measure=c("support","confidence"))
Figure 3. The strength of the rule measured by a lift.
The redder the color, the stronger the association rule.The strength of the rule is measured by a lift and the layout depends on the value of your support and confidence.
The next chart shows the same information but in a different graphical way:
plot(topRules, method = "grouped")
Figure 4. Grouped matrix for 7 rules.
Now we will focus on particular items of our basket. The most common item is the city of Ang Mo Kio, so we will analyze for this level.
one_item <-subset(rules, items %in% "town=ANG MO KIO")
inspect(one_item)
## lhs rhs support confidence coverage lift count
## [1] {town=ANG MO KIO} => {flat_model=New Generation} 0.10212418 0.984252 0.10375817 2.206455 125
## [2] {town=ANG MO KIO,
## flat_type=3 ROOM} => {flat_model=New Generation} 0.07761438 1.000000 0.07761438 2.241758 95
plot(one_item, method="graph", measure="lift",shading="confidence")
Figure 5. Graph for 2 rules.
The algorithm returns two rules that confirm that when buying a flat in Ang Mo Kio, they are most likely to choose a New Generation flat.
plot(one_item, method="paracoord")
Figure 6. Parallel coordinates plot for 2 rules.
The last plot show parallel coordinates. It shows a diagram of the selection of features when buying an apartment.
This paper presents the basket analysis of housing market. This method examines customer buying patterns by identifying associations among various items that customers place in their shopping baskets.The analysis showed that the most frequently sold apartments are those of the ‘New Generation’ type. The most common rule is that customers who were interested in buying an apartment in Ang Mo Kio were also interested in the ‘New Generation’ apartment model. Another common dependence was the purchase of Premium Apartments in Punggol or Sengkand.In Punggol the most often bought 4-room apartments, while in Bedok the 3-room apartments, which belonged to the ‘Premium’ model.The analysis allows us to examine the housing market and consumer behaviour, which allows us to select more interesting proposals for a potential buyer.
Sources: 1.Statystyczna Analiza Danych z wykorzystaniem programu R. Marek Walesiak. 2012. PWN. 2.Association Rule – Extracting Knowledge Using Market Basket Analysis. Research Journal of Recent Sciences. Raorane A.A. 2012 3.A Survey on Association Rule Mining in Market Basket Analysis. Savi Gupta.International Journal of Information and Computation Technology. 2014